Jasper Slingsby
Uncertainty determines the utility of a forecast:
If the uncertainty in a forecast is too high, then it is of no utility to a decision maker.
If the uncertainty is not properly quantified and presented, it can lead to poor decisions.
This leaves forecasters with four overarching questions:
The utility of a model/forecast depends on:
combined with
Together these determine the “ecological forecast horizon” (Petchey et al. (2015)).
The ecological forecast horizon (from Petchey et al. (2015)).
Some forecasts may lose proficiency very quickly, crossing (or starting below) the forecast proficiency threshold. Conversely, if the forecast loses proficiency more slowly, or the proficiency threshold requirements are lower, the forecast horizon is further into the future.
Dietze provides a nice classification of prediction uncertainty in his book (Dietze 2017a) and subsequent paper (Dietze 2017b) in the form of an equation (note that I’ve spread it over multiple lines):
\[ \underbrace{Var[Y_{t+1}]}_\text{predictive variance} \approx \; \underbrace{stability*uncertainty}_\text{initial conditions} \; + \\ \space\\ \underbrace{sensitivity*uncertainty}_\text{drivers} \; + \\ \space\\ \underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} \; + \\ \space\\ \underbrace{Var[\epsilon]}_\text{process error} \; \; \]
Bayes’ Rule:
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The posterior is proportional to the likelihood times the prior.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The posterior is the conditional probability of the parameters given the data \(p(\theta|D)\) and provides a probability distribution for the values any parameter can take,
This allows us to represent uncertainty in the model and forecasts as probabilities, which is powerful for indicating the probability of our forecast being correct.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The likelihood \(p(D|\theta)\) represents the probability of the data \(D\) given the model with parameter values \(\theta\), and is used in analyses to find the likelihood profiles of the parameters.
This term looks for the best estimate of the parameters using Maximum Likelihood Estimation, where the likelihood of the parameters are maximized for a given model by choosing the parameters that maximize the probability of the data.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The prior is the marginal probability of the parameters, \(p(\theta)\).
It represents the credibility of the parameter values, \(\theta\), without the data, and is specified using our prior belief of what the parameters should be, before interrogating the data. This provides a formal probabilistic framework for the scientific method, in that new evidence must be considered in the context of previous knowledge, providing the opportunity to update our beliefs.
Data can enter (or be fused with) a model in a variety of ways. Here we’ll discuss these and then give an example of the Fynbos postfire recovery model used in the practical.
The opportunities for data fusion are linked to model structure, so we’ll revisit how some aspects of model structure change as we move from Least Squares to Maximum Likelihood Estimation to “single-level” Bayes to Hierarchical Bayes and the data fusion opportunities provided by each.
Conceptually (and perhaps over-simplistically), one can think of the changes in model structure as being the addition of model layers, each of which provide more opportunities for data fusion.
Least Squares makes no distinction between the process model and the data model.
the process model models the drivers determining the pattern observed (i.e. is the model equation you will be familiar with, such as a linear model)
a data model models the observation error or data observation process, i.e. the factors that may cause mismatch between the process model and the data
in least squares the data model can only ever be a normal (also called Gaussian) distribution, because we require homogeneity of variance in order to minimize the sums of squares
the only opportunity to add data to a least squares model is via the process model